27 research outputs found

    Synchronization / communication techniques for OmpSs@FPGA

    Get PDF
    HPC machines are introducing more and more heterogeneity in their architecture on the road to exascale systems. The increasing complexity of the machines due to the variety of hardware architectures and accelerators makes efficient programming a task harder than ever. Heterogeneous parallel programming models, such as OmpSs@FPGA, help the programmer handle the most unfriendly parts of working with accelerators. This master thesis analyzes the OmpSs@FPGA communication system and proposes a set of techniques to overcome the problems related to it and potentially improve the performance of the applications. The results show that the techniques proposed speed up the applications under certain conditions and, most importantly, solves some of the limitations that had the previous communication system. In particular, the new techniques specially improve the explotation of fine-grain parallelism and open the door to explore new possibilities with regard to data communication and re-use. Moreover, a tool (autoVivado) that automatically manages the process of bitstream generation, from the synthesis of the HLS code to the generation of the device-tree, has been developed as part of this master thesis. autoVivado has been fully integrated with the OmpSs@FPGA compiler infrastructure, providing the programmers a way to transparently generate parallel heterogenous programs and bitstreams from OmpSs applications that use FPGA accelerators

    Implementation of the K-Means Algorithm on Heterogeneous Devices: A Use Case Based on an Industrial Dataset

    Get PDF
    This paper presents and analyzes a heterogeneous implementation of an industrial use case based on K-means that targets symmetric multiprocessing (SMP), GPUs and FPGAs. We present how the application can be optimized from an algorithmic point of view and how this optimization performs on two heterogeneous platforms. The presented implementation relies on the OmpSs programming model, which introduces a simplified pragma-based syntax for the communication between the main processor and the accelerators. Performance improvement can be achieved by the programmer explicitly specifying the data memory accesses or copies. As expected, the newer SMP+GPU system studied is more powerful than the older SMP+FPGA system. However the latter is enough to fulfill the requirements of our use case and we show that uses less energy when considering only the active power of the execution.This work is partially supported by the European Union H2020 project AXIOM (grant agreement n. 645496), HiPEAC (grant agreement n. 687698), and Mont-Blanc (grant agreements n. 288777, 610402 and 671697), the Spanish Government Programa Severo Ochoa (SEV-2015-0493), the Spanish Ministry of Science and Technology (TIN2015- 65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programaci´o i Entorns d’Execució Paral·lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Picos, a hardware task-dependence manager for task-based dataflow programming models

    Get PDF
    Task-based programming Task-based programming models such as OpenMP, Intel TBB and OmpSs are widely used to extract high level of parallelism of applications executed on multi-core and manycore platforms. These programming models allow applications to be expressed as a set of tasks with dependences to drive their execution at runtime. While managing these dependences for task with coarse granularity proves to be highly beneficial, it introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we propose a hardware/software co-design Picos that manages inter-task dependences efficiently. In this paper we describe the main ideas of our proposal and a prototype implementation. This prototype is integrated with a parallel task- based programming model and evaluated with real executions in Linux embedded system with two ARM Cortex-A9 and a FPGA. When compared with a software runtime, our solution results in more than 1.8x speedup and 40% of energy savings with only 2 threads.This work is supported by the projects SEV-2015-0493 and TIN2015-65316-P, by the project 2014-SGR-1051 and 2014-SGR-1272, by the RoMoL GA 321253 and by the project cooperation agreement with LG Electronics, and thank the Xilinx University Program.Postprint (published version

    Application Acceleration on FPGAs with OmpSs@FPGA

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.OmpSs@FPGA is the flavor of OmpSs that allows offloading application functionality to FPGAs. Similarly to OpenMP, it is based on compiler directives. While the OpenMP specification also includes support for heterogeneous execution, we use OmpSs and OmpSs@FPGA as prototype implementation to develop new ideas for OpenMP. OmpSs@FPGA implements the tasking model with runtime support to automatically exploit all SMP and FPGA resources available in the execution platform. In this paper, we present the OmpSs@FPGA ecosystem, based on the Mercurium compiler and the Nanos++ runtime system. We show how the applications are transformed to run on the SMP cores and the FPGA. The application kernels defined as tasks to be accelerated, using the OmpSs directives are: 1) transformed by the compiler into kernels connected with the proper synchronization and communication ports, 2) extracted to intermediate files, 3) compiled through the FPGA vendor HLS tool, and 4) used to configure the FPGA. Our Nanos++ runtime system schedules the application tasks on the platform, being able to use the SMP cores and the FPGA accelerators at the same time. We present the evaluation of the OmpSs@FPGA environment with the Matrix Multiplication, Cholesky and N-Body benchmarks, showing the internal details of the execution, and the performance obtained on a Zynq Ultrascale+ MPSoC (up to 128x). The source code uses OmpSs@FPGA annotations and different Vivado HLS optimization directives are applied for acceleration.This work is partially supported by the European Union H2020 program through the EuroEXA project (grant 754337), and HiPEAC (GA 687698), by the Spanish Government through Programa Severo Ochoa (SEV-2015- 0493), by the Spanish Ministry of Science and Technology (TIN2015-65316-P) and the Departament d’Innovació Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Paral·lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    The AXIOM software layers

    Get PDF
    AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft

    New insights into the genetic etiology of Alzheimer's disease and related dementias

    Get PDF
    Characterization of the genetic landscape of Alzheimer's disease (AD) and related dementias (ADD) provides a unique opportunity for a better understanding of the associated pathophysiological processes. We performed a two-stage genome-wide association study totaling 111,326 clinically diagnosed/'proxy' AD cases and 677,663 controls. We found 75 risk loci, of which 42 were new at the time of analysis. Pathway enrichment analyses confirmed the involvement of amyloid/tau pathways and highlighted microglia implication. Gene prioritization in the new loci identified 31 genes that were suggestive of new genetically associated processes, including the tumor necrosis factor alpha pathway through the linear ubiquitin chain assembly complex. We also built a new genetic risk score associated with the risk of future AD/dementia or progression from mild cognitive impairment to AD/dementia. The improvement in prediction led to a 1.6- to 1.9-fold increase in AD risk from the lowest to the highest decile, in addition to effects of age and the APOE ε4 allele

    Multiancestry analysis of the HLA locus in Alzheimer’s and Parkinson’s diseases uncovers a shared adaptive immune response mediated by HLA-DRB1*04 subtypes

    Get PDF
    Across multiancestry groups, we analyzed Human Leukocyte Antigen (HLA) associations in over 176,000 individuals with Parkinson’s disease (PD) and Alzheimer’s disease (AD) versus controls. We demonstrate that the two diseases share the same protective association at the HLA locus. HLA-specific fine-mapping showed that hierarchical protective effects of HLA-DRB1*04 subtypes best accounted for the association, strongest with HLA-DRB1*04:04 and HLA-DRB1*04:07, and intermediary with HLA-DRB1*04:01 and HLA-DRB1*04:03. The same signal was associated with decreased neurofibrillary tangles in postmortem brains and was associated with reduced tau levels in cerebrospinal fluid and to a lower extent with increased Aβ42. Protective HLA-DRB1*04 subtypes strongly bound the aggregation-prone tau PHF6 sequence, however only when acetylated at a lysine (K311), a common posttranslational modification central to tau aggregation. An HLA-DRB1*04-mediated adaptive immune response decreases PD and AD risks, potentially by acting against tau, offering the possibility of therapeutic avenues

    Anàlisi i optimització de l'aplicació bioinformàtica de docking: Lightdock

    No full text
    [CATALÀ] En aquest projecte es realitza una anàlisi de Lightdock, una aplicació de protein docking, i es detallen un seguit d'optimitzacions amb l'objectiu de reduir del temps d'execució del programa. Finalment es proposen un conjunt de millores per a la continuació de l'optimització.[ANGLÈS] This project consists of a study of Lightdock, a protein docking application, performance and a set of optimizations for the application, with the objective of reducing the program's execution time. Finally, a set of improvements is proposed to further optimize the program

    Synchronization / communication techniques for OmpSs@FPGA

    No full text
    HPC machines are introducing more and more heterogeneity in their architecture on the road to exascale systems. The increasing complexity of the machines due to the variety of hardware architectures and accelerators makes efficient programming a task harder than ever. Heterogeneous parallel programming models, such as OmpSs@FPGA, help the programmer handle the most unfriendly parts of working with accelerators. This master thesis analyzes the OmpSs@FPGA communication system and proposes a set of techniques to overcome the problems related to it and potentially improve the performance of the applications. The results show that the techniques proposed speed up the applications under certain conditions and, most importantly, solves some of the limitations that had the previous communication system. In particular, the new techniques specially improve the explotation of fine-grain parallelism and open the door to explore new possibilities with regard to data communication and re-use. Moreover, a tool (autoVivado) that automatically manages the process of bitstream generation, from the synthesis of the HLS code to the generation of the device-tree, has been developed as part of this master thesis. autoVivado has been fully integrated with the OmpSs@FPGA compiler infrastructure, providing the programmers a way to transparently generate parallel heterogenous programs and bitstreams from OmpSs applications that use FPGA accelerators

    Synchronization / communication techniques for OmpSs@FPGA

    No full text
    HPC machines are introducing more and more heterogeneity in their architecture on the road to exascale systems. The increasing complexity of the machines due to the variety of hardware architectures and accelerators makes efficient programming a task harder than ever. Heterogeneous parallel programming models, such as OmpSs@FPGA, help the programmer handle the most unfriendly parts of working with accelerators. This master thesis analyzes the OmpSs@FPGA communication system and proposes a set of techniques to overcome the problems related to it and potentially improve the performance of the applications. The results show that the techniques proposed speed up the applications under certain conditions and, most importantly, solves some of the limitations that had the previous communication system. In particular, the new techniques specially improve the explotation of fine-grain parallelism and open the door to explore new possibilities with regard to data communication and re-use. Moreover, a tool (autoVivado) that automatically manages the process of bitstream generation, from the synthesis of the HLS code to the generation of the device-tree, has been developed as part of this master thesis. autoVivado has been fully integrated with the OmpSs@FPGA compiler infrastructure, providing the programmers a way to transparently generate parallel heterogenous programs and bitstreams from OmpSs applications that use FPGA accelerators
    corecore